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The remote control unit supports multi-modal dialog with 
the user, through which the user can easily select programs 
for viewing or recording. The remote control houses a 
microphone into which the user can input natural language 
speech. The input speech is recognized and interpreted by a 
natural language parser that extracts the semantic content of 
the user's speech. The parser works in conjunction with an 
electronic program guide, through which the remote control 
system is able to ascertain what programs are available for 
viewing or recording and supply appropriate prompts to the 
user. In one embodiment, the remote control includes a 
touch screen display upon which the user may view prompts 
or make selections by pen input or tapping. Selections made 
on the touch screen automatically limit the context of the 
ongoing dialog between user and remote control, allowing 
the user to interact naturally with the unit. The remote 
control unit can control virtually any audio-video 
component, including those designed before the current 
technology. The remote control system can be packaged 
entirely within the remote control handheld unit, or compo- 
nents may be distributed in other systems attached to the 
user's multimedia equipment. 

26 Claims, 3 Drawing Sheets 
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UNIVERSAL REMOTE CONTROL 
ALLOWING NATURAL LANGUAGE 
MODALITY FOR TELEVISION AND 
MULTIMEDIA SEARCHES AND REQUESTS 

This application is related to U.S. Pat. No. 6,324,512 
issued on Nov. 27, 2001 and entitled "System and Method 
for Allowing Family Members to Access TV Contents and 
Program Media Recorder Over Telephone Or Internet". 

BACKGROUND AND SUMMARY OF THE 
INVENTION 

The ubiquitous remote control, often a multitude of them, 
has found its way onto virtually every coffee table in the 
television viewing rooms throughout the world. Few tele- 
vision viewers have not experienced the frustration of trying 
to perform even a simple command, such as turning on the 
television and watching a pre-recorded movie, only to be 
thwarted because he or she cannot figure out which button 
or buttons to press on which remote control units. 

In an attempt to address the proliferation of multiple 
remote controls, many companies offer a universal remote 
control that is able to operate a variety of different audio- 
video components. These remote controls, of necessity, 
feature a panoply of buttons, many of them having dual 
functions, in order to control the principal functions of all 
devices in the user's multimedia setup. 

While the conventional universal remote control may 
eliminate the need for having multiple remote control units 
on the coffee table, it does little to simplify the user's 
interaction with his or her audio-video or multimedia sys- 
tem. On the contrary most universal remote control units are 
so complex that they actually impede the user's ability to 
control the equipment. 

The present invention tackles this problem through speech 
technology recognition and sophisticated natural language 
parsing components, that allows the user to simply speak 
into the remote control unit and have his or her commands 
carried out. While the spoken commands can be simple 
commands such as "Play VCR" or "Record Channel 6", the 
natural language parser offers far more complex commands 
than this. For example, the user could speak: "Show me a 
funny movie starring Marilyn Monroe." Using the speech 
recognition and parser components, the system will search 
through an electronic program guide or movie database and 
can respond to the user (for instance) that "Some Like It 
Hot" will be playing next Friday. The user could then, for 
example, instruct the system to record that movie when it 
comes on. 

Recording commands need not be limited to the entire 
movie or program. Rather, the user could enter a command 
such as: "Record the last five minutes of tonight's Toronto- 
Los Angeles baseball game." Again, the speech recognition 
and parser components convert this complex command into 
a sequence of actions that cause the recording device in the 
user's system to make the requested recording at the appro- 
priate time. 

The remote control of the invention can be constructed as 
a self-contained unit having all of the parser and speech 
recognition components on board, or it may be manufac- 
tured in multiple components, allowing some of the more 
complex computational operations to be performed by a 
processor located in a television set, set top box, or auxiliary 
multimedia control unit. In the latter case, the hand-held 
remote and the remote command unit communicate with 
each other by wireless transmission. Preferably, the hand- 
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held remote control unit includes an infrared port through 
which the remote control can interact with older equipment 
in the user's multimedia setup. Thus the remote control of 
the invention even allows sophisticated natural language 
5 speech commands to be given to those older audio-video 
components. 

For a more complete understanding of the invention, its 
objects and advantages, refer to the following specification 
and to the accompanying drawings. 

10 BRIEF DESCRIPTION OF THE DRAWINGS 

FIG. 1 is a plan view of an embodiment of the remote 
control in accordance with the invention; 

FIG. 2 is a block diagram illustrating the components of 
15 the presently preferred embodiment; 

FIG. 3 is a block diagram depicting the components of the 
natural language parser of the presently preferred embodi- 
ment of the invention; and 
20 FIG. 4 is a block diagram depicting the components of the 
local parser of the presently preferred embodiment of the 
invention. 

DESCRIPTION OF THE PREFERRED 
EMBODIMENTS 

25 The remote control of the invention can take many forms. 
An exemplary embodiment is illustrated in FIG. 1, where the 
remote control is shown at 10 and an exemplary television 
set is shown at 12. In the preferred embodiment the remote 

3Q control 10 and television 12 communicate wirelessly with 
one another through suitable radio frequency link or infrared 
link. 

The remote control is designed to operate not only more 
modern digital interactive television and hard disk recorder 

35 equipment, but also older models of televisions, VCRs, 
DVD and laser disk players, surround sound processors, 
tuners, and the like. Accordingly, the remote control includes 
a light-emitting diode transmitter 14 with which the unit 
may communicate with all popular home entertainment and 

40 multimedia components. This same transmitter can serve as 
the communication link between the remote control and the 
television (to implement some of the features described 
herein). 

In an alternate embodiment, the remote control 10 and 
45 television 12 communicate through a bidirectional data 
communication link that allows the speech recognition and 
natural language parsing components to be distributed 
among the remote control, television and optionally other 
components within the multimedia system. 
50 Although not required to implement the speech-enabled 
dialog system, the presently preferred remote control 10 also 
includes a lighted display 16 that may supply prompts to the 
user as well as information extracted from the electronic 
program guide. The screen may be touch sensitive or tap 
55 sensitive, allowing the user to select menu options and 
provide handwritten input through the stylus 18. Users who 
regularly employ pen-based personal digital assistant (PDA) 
devices will find the stylus input modality particularly 
useful. 

60 The remote control 10 also includes a complement of 
pushbuttons 20, for performing numeric channel selection 
and other commonly performed operations, such as increas- 
ing and decreasing the audio volume. A jog shuttle wheel 22 
may also be included, to allow the user to use this feature in 

65 conjunction with recorders and disk players. 

By virtue of the bi-directional link between remote con- 
trol 10 and television 12, the system is capable of displaying 
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on-screen prompts and program guide information on both output of speech recognizer module 40 is supplied to the 
the television monitor screen, as illustrated at 24, and on the natural language parser 42. This parser works in conjunction 
display screen 16 of the remote control. If desired, the with a set of grammars 44 that allow the system to interpret 
on-screen display 24 can be suppressed, so that the user may me meaning behind the user's spoken instructions. In the 
make menu item selections and electronic program guide 5 presently preferred embodiment these grammars are goal- 
selections using the remote control screen, without the need oriented grammars comprising a collection of frame sen- 
to display the same information on the television while tences having one or more slots that the system will fill in 
watching a program. based upon the words recognized from the users input 
A particularly useful aspect of remote control 10 is its speeCD> More delail about ^ pre sently preferred parser and 
natural language speech modality. The remote control is 1Q mese goalHOriented grammars is presented below, 
provided with a microphone as at 26. The user speaks in 

natural language sentences, and these spoken utterances are ^ . natural language parser 42 has access to a stored 

picked up by microphone 26 and supplied to a sophisticated semantic representation of the electronic program guide 46. 

speech understanding system. The speech understanding The electronic program guide can be downloaded from the 

system allows the user to give the television set and other internet or supplied via the entertainment system's cable or 

associated equipment (such as hard disk recorders or VCR 15 satellite link. These sources of electronic program guide 

recorders) search and record commands in interactive, natu- information are illustrated generally at 50. Typically, the 

ral language. television tuner 52 may be used to obtain this information 

As an example of a spoken search command, the user and fumisn il 10 ^ semantic representation stored at 46. 
could say into the microphone, "Show me a funny movie Alternatively, this information could be supplied by tele- 
starring Marilyn Monroe." Using its speech recognition and 20 P hone connection to a suitable Internet service provider or 
parser components, this system searches through an elec- dedicated electronic program guide service provider, 
ironic program guide or movie database and responds to the The typical electronic program guide represents a corn- 
user whether any options meet the user's request. The plex hierarchial structure that breaks down different types of 
system might respond, for instance, that "Some Like It Hot" program content according to type. Thus a program guide 
will be playing next Friday. 25 may divide programs into different categories, such as 

Armed with this information, the user may elect to record movies, sports, news, weather, and the like. These categories 

the movie, by simply speaking, "Please record Some Like It mav further be subdivided. Thus movies may be subdivided 

Hot," into categories such as comedies, drama, science fiction and 

Recording instructions can be quite explicit, thanks to the 30 50 forth - A semantic representation of the electronic program 

sophisticated natural language system of the invention. contents is stored at 46, based on the same goal- 

Thus, the user could enter a complex record command such oriented grammar structure used by the natural language 

as, "Record the last five minutes of tonight's Toronto- P^ 1 - ™* allows the P^ 1 " t0 readil y find information 

LosAngeles baseball game." Again, the speech recognition about what is available for viewing. If the user has asked for 

and parser components convert this complex command into 35 come dy movies, the comedy movie portion of the semantic 

a sequence of actions mat the recorder within the system will representation is accessed by the parser, and the available 

carry out. programs falling under this category may then be displayed 

Referring to FIG. 2, the major functional components of t0 the user 45 wil1 ** more ^ described below, 

the remote control system will now be described. In this In some instances the natural language parser will imme- 

regard, it is important to understand that the components of 40 diately identify a program the user is interested in watching, 

the remote control system can be packaged entirely within In otne r instances, there may be multiple choices, or no 

the remote control device itself, or one or more of these choices. To accommodate these many possibilities, the sys- 

components can be distributed or implemented in other te m includes a dialog manager 54. The dialog manager 

components within the system. The more processor- interfaces with the natural language parser 42, and generates 

intensive functions of the system may be performed, for 45 interactive prompts for synthesized speech or on-screen 

example, by processors located in larger, more powerful presentation to the user. These prompts are designed to elicit 

components such as set top boxes, interactive digital tele- further information from the user, to help the natural lan- 

vision sets, multimedia recording systems, and the like. guage parser find program offerings the user may be inter- 

For example, the microphone and basic components of esled ia Tne dialog manager has a user profile data store 56, 

the speech recognizer may be housed in the remote control 50 which stores information about the user's previous informa- 

unit, with the remaining components housed in another tion selections, and also information about how the user 

piece of equipment. If desired, the speech recognizer itself ^ to have the information displayed. This data store thus 

can be subdivided into components, some of which are hel Pf me ^og manager tune its prompts to best suit the 

housed in the remote control and others of which are housed user's expectations. 

elsewhere. By way of example, the component housed in the 55 The presently preferred natural language parser will now 

remote control may process the input speech by extracting be described. FIG. 3 depicts components of the natural 

speech features upon which the speech models are trained. language parser 42 in more detail. In particular, speech 

The remote control then transmits these extracted features to understanding module 128 includes a local parser 160 to 

the component located elsewhere for further speech recog- identify predetermined relevant task- related fragments, 

nition processing. Alternatively, the input speech may sim- 60 Speech understanding module 128 also includes a global 

ply be transmitted by the remote control in the audio domain parser 162 to extract the overall semantics of the speaker's 

to a speech recognition component located elsewhere. These request. 

are of course only a few possible examples of how the The local parser 160 utilizes in the preferred embodiment 

functionality of the invention may be deployed in distributed small and multiple grammars along with several passes and 

fashion. 65 a unique scoring mechanism to provide parse hypotheses. 

Speech input supplied through microphone 26 is first For example, the novel local parser 102 recognizes accord- 
digitized and fed to speech recognizer module 40. The ing to this approach phrases such as dates, names of people, 
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and movie categories. If a speaker utters "record me a sentences, due to the following reasons: the input to the 

comedy in which Mel Brooks stars and is shown before recognizer is casual, dialog style, natural speech can contain 

January 23rd", the local parser recognizes: "comedy" as broken sentences, partial phrases, and the insertion, 

being a movie category; "January 23rd" as a date; and "Mel omission, or misrecognition of errors by the speech recog- 

Brooks" as an actor. The global parser assembles those items 5 nizer even when the speech input is considered correct. The 

(movie category, date, etc.) together and recognizes that the natural language parser 42 deals robustly with all types of 

speaker wishes to record a movie with certain constraints. input and extracts as much information as possible. 

Speech understanding module 128 includes knowledge FIG. 4 depicts the different components of the local parser 

database 163 which encodes the semantics of a domain (i.e., 160 of the natural language parser 42. The natural language 

goal to be achieved). In this sense, knowledge database 163 ]Q parser 42 preferably utilizes generalized parsing techniques 

is preferably a domain-specific database as depicted by in a multi-pass approach as a fixed-point computation. Each 

reference numeral 165 and is used by dialog manager 130 to topic is described as a context-sensitive LR (left-right and 

determine whether a particular action related to achieving a rightmost derivation) grammar, allowing ambiguities. The 

predetermined goal is possible. following are references related to context-sensitive LR 

The preferred embodiment encodes the semantics via a grammars: A. Aho and J. D. UUman, Principles of Compiler 
frame data structure 164. The frame data structure 164 15 Design, Addison Wesley Publishing Co., Reading, Mass. 
contains empty slots 166 which are filled when the semantic (1977); and N. Tomita, Generalized LR Parsing, Kluwer 
interpretation of global parser 162 matches the frame. For Academic Publishers, Boston, Mass. (1991). 
example, a frame data structure (whose domain is tuner At eacn of tne computation, a generalized parsing 
commands) includes an empty slot for specifying the algorithm is used to generate preferably all possible (both 
viewer-requested channel for a time period. If viewer 120 20 complete and partial) parse trees independently for each 
has provided the channel, then that empty slot is filled with lar ? eted l0 P ic - Each P*f potentially generates several alter- 
that information. However, if that particular frame needs to na ' lve Purees, ^ch parse-tree representing a possib y 
be filled after the viewer has initially provided its request, different interpretation of a particular topic. The multiple 
, 11A . , , . j passes through preferably parallel and mdependent paths 
then dialog manager 130 instructs computer response mod- ^ ^ {q a eUm ^ tion of ambiguities and overlap 
ule 134 to ask viewer 120 to provide a desired channel. among diffefent {opic& ^ generalized algorithm £ 

Hie frame data structure 164 preferably includes multiple a systematic way of aI1 possible parse-trees so that 

frames which each in turn have multiple slots. One frame the ^ best caad i dates are selected utilizing the contextual 

may have slots directed to attributes of a movie, director, and information present in the system, 

type of movie. Another frame may have slots directed to 3Q ^ m m 160 ^ carried om ^ ±nt 

attributes associated with the time in which the movie is i exi cal analysis 220; parallel parse-forest generation for each 

playing the channel and so forth. tQpic (fof examplC) generalors m md m} . ^ ^ 

The following reference discusses global parsers and and synthesis of parsed components as shown generally by 

frames: R. Kuhn and R. D. Mori, Spoken Dialogues with reference numeral 234. 

Computers (Chapter 14: Sentence Interpretation), Academic 35 Lexical Analysis: 

Press, Boston (1998). A spea ker utters a phrase that is recognized by an auto- 

Dialog manager 130 uses dialog history data file 167 to mat i c speech recognizer 217 which generates input sentence 

assist in filling in empty slots before asking the speaker for 218. Lexical analysis stage 220 identifies and generates tags 

the information. Dialog history data file 167 contains a log f or the topics (which do not require extensive grammars) in 

of the conversation which has occurred through the device 40 i nput sentence 218 using lexical filters 226 and 228. These 

of the present invention. For example, if a speaker utters "I'd include, for example, movie names; category of movie; 

like to watch another Marilyn Monroe movie," the dialog producers; names of actors and actresses; and the like. A 

manager 130 examines the dialog history data file 167 to regular-expression scan of the input sentence 218 using the 

check what movies the user has already viewed or rejected keywords involved in the mentioned exemplary tags is 

in a previous dialog exchange. If the speaker had previously 4S typically sufficient at this level. Also, performed at this stage 

rejected "Some Like It Hot", then the dialog manager 130 ^ tn e tagging of words in the input sentence that are not part 

fills the empty slot of the movie title with movies of a Q f tne lexicon of particular grammar. These words are 

different tide. If a sufficient number of slots have been rilled, indicated using an X-tag so that such noise words are 

then the present invention will ask the speaker to verify and replaced with the letter "X". 

confirm the program selection. Thus, if any assumptions 50 Parallel Parse-forest Generation: 

made by the dialog manager 130 through the use of dialog The parser 42 uses a high-level general parsing strategy to 

history data file 167 prove to be incorrect, then the speaker describe and parse each topic separately, and generates tags 

can correct the assumption. and maps t hem to the input stream. Due to the nature of 

The natural language parser 42 analyzes and extracts unstructured input text 218, each individual topic parser 
semantically important and meaningful topics from a loosely 55 preferably accepts as large a language as possible, ignoring 
structured, natural language text which may have been all but important words, dealing with insertion and deletion 
generated as the output of an automatic speech recognition errors. The parsing of each topic involves designing context- 
system (ASR) used by a dialogue or speech understanding sensitive grammar rules using a meta-level specification 
system. The natural language parser 42 translates the natural language, much like the ones used in LR parsing. Examples 
language text input to a new representation by generating 60 of grammars include grammar A 240 and grammar B 242. 
well -structured tags containing topic information and data, Using the present invention's approach, topic grammars 240 
and associating each tag with the segments of the input text and 242 are described as if they were an LR-type grammar, 
containing the tagged information. In addition, tags may be containing redundancies and without ehminating shift and 
generated in other forms such as a separate list, or as a reduce conflicts. The result of parsing an input sentence is all 
semantic frame. 65 possible parses based on the grammar specifications. 

Robustness is a feature of the natural language parser 42 Generators 230 and 232 generate parse forests 250 and 

as the input can contain grammatically incorrect English 252 for their topics. Tag-generation is done by synthesizing 
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actual information found in the parse tree obtained during 
parsing. Tag generation is accomplished via tag and score 
generators 260 and 262 which respectively generate tags 264 
and 266. Each identified tag also carries information about 
what set of input words in the input sentence are covered by 
the tag. Subsequently the tag replaces its cover-set. In the 
preferred embodiment, context information 267 is utilized 
for tag and score generations, such as by generators 260 and 
262. Context information 267 is utilized in the scoring 
heuristics for adjusting weights associated with a heuristic 
scoring factor technique that is discussed below. Context 
information 267 preferably includes word confidence vector 
268 and dialogue context weights 269. However, it should 
be understood that the parser 42 is not limited to using both 
word confidence vector 268 and dialogue context weights is 
269, but also includes using one to the exclusion of the other, 
as well as not utilizing context information 267. 

Automatic speech recognition process block 217 gener- 
ates word confidence vector 268 which indicates how well 
the words in input sentence 218 were recognized. Dialog 20 
manager 130 generates dialogue context weights 269 by 
determining the state of the dialogue. For example, dialog 
manager 130 asks a user about a particular topic, such as, 
what viewing time is preferable. Due to this request, dialog 
manager 130 determines that the state of the dialogue is 25 
time-oriented. Dialog manager 130 provides dialogue con- 
text weights 269 in order to inform the proper processes to 
more heavily weight the detected time-oriented words. 
Synthesis of Tag-components: 

The topic spotting parser of the previous stage generates 30 
a significant amount of information that needs to be ana- 
lyzed and combined together to form the final output of the 
local parser. The parser 42 is preferably as "aggressive" as 
possible in spotting each topic resulting in the generation of 
multiple tag candidates. Additionally in the presence of 35 
numbers or certain key-words, such as "between", "before", 
"and", "or", "around", etc., and especially if these words 
have been introduced or dropped due to recognition errors it 
is possible to construct many alternative tag candidates. For 
example, an input sentence could have insertion or deletion 
errors. The combining phase determines which tags form a 
more meaningful interpretation of the input. The parser 42 
defines heuristics and makes a selection based on them using 
a N-Best candidate selection process. Each generated tag 
corresponds to a set of words in the input word string, called 
the tag's cover-set. 

A heuristic is used that takes into account the cover-sets 
of the tags used to generate a score. The score roughly 
depends.on the size of the cover-set, the sizes in the number 
of the words of the gaps within the covered items, and the 50 
weights assigned to the presence of certain keywords. In the 
preferred embodiment, ASR-derived confidence vector and 
dialog context information are utilized to assign priorities to 
the lags. For example applying channel-tags parsing first 
potentially removes channel- related numbers that are easier 55 
to identify uniquely from the input stream, and leaves fewer 
numbers to create ambiguities with other tags. Preferably, 
dialog context information is used to adjust the priorities. 



40 
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N-Best Candidates Selection 



60 



At the end of each pass, an N-best processor 270 selects 
the N-best candidates based upon the scores associated with 
the tags and generates the topic- tags, each representing the 
information found in the corresponding parse-tree. Once 
topics have been discovered this way, the corresponding 65 
words in the input can be substituted with the tag informa- 
tion. This substitution transformation eliminates the corre- 



sponding words from the current input text. The output 280 
of each pass is fed-back to the next pass as the new input, 
since the substitutions may help in the elimination of certain 
ambiguities among competing grammars or help generate 
better parse-trees by filtering out overlapping symbols. 

Computation ceases when no additional tags are gener- 
ated in the last pass. The output of the final pass becomes the 
output of the local parser to global parser 162. Since each 
phase can only reduce the number of words in its input and 
the length of the input text is finite, the number of passes in 
the fixed-point computation is linearly bounded by the size 
of its input. 

The following scoring factors are used to rank the alter- 
native parse trees based on the following attributes of a 
parse-tree: 

Number of terminal symbols. 

Number of non-terminal symbols. 

The depth of the parse-tree. 

The size of the gaps in the terminal symbols. 

ASR-Confidence measures associated with each terminal 
symbol. 

Context-adjustable weights associated with each terminal 
and non-terminal symbol. 

Each path preferably corresponds to a separate topic that 
can be developed independently, operating on a small 
amount of data, in a computationally inexpensive way. The 
architecture of the parser 42 is flexible and modular so 
incorporating additional paths and grammars, for new 
topics, or changing heuristics for particular topics is straight 
forward, this also allows developing reusable components 
that can be shared among different systems easily. 

From the foregoing it will be appreciated that the remote 
control system of the invention offers a great deal of 
user-friendly functionality not currently found in any elec- 
tronic program guide control system or remote control 
system. While the invention has been described in its 
presently preferred embodiment, it will be understood that 
the invention is capable of modification without departing 
from the spirit of the invention as set forth in the appended 
claims. 

What is claimed is: 

1. A remote control system for controlling at least one 
audio/video component comprising: 
a handheld case; 

a microphone disposed in said case for receiving speech 
input from a user; 

a communication system disposed in said case for trans- 
mitting data signals to a location remote from said 
handheld case; 

a speech recognizer for processing said speech input; 

a memory for storing a semantic representation of an 
electronic program guide; and 

a natural language parser in communication with said 
speech recognizer and with said memory, said parser 
being operative to extract semantic content from said 
processed speech input and to access semantic repre- 
sentation of said electronic program guide using said 
extracted semantic content to generate control instruc- 
tions for said audio/video component such that the 
natural language parser is a task-based parser employ- 
ing a grammar comprising a plurality of frames having 
slots representing semantic structure of said electronic 
program guide, wherein the natural language parser 
further comprises a local parser adapted to identify 
predetermined task -related fragments in said speech 
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input, and a global parser adapted to receive task- a dialog manager coupled to said speech recognizer 

related fragments and to extract overall semantics from system, to said user profile data store, and to said 

the task-related fragments. display screen for issuing control commands through 

2. The remote control system of claim 1 wherein said said communication interface and for displaying infor- 
speech recognizer is disposed within said handheld case. 5 mation on said display screen based at least in part on 

3. The remote control system of claim 1 further compris- information obtained from said user profile data store; 

ing: 3X16 

e -j l .L u a natural language parser in communication with said 

a processor component remote from said handheld case " * ? F . , " ~r 

K Ju .Vj , ... , . . , speech recognizer system, said parser being operative 

and wherem said speech recognizer is disposed m said ^ * m[ f semantic content from said processed 

processor component. speech input and to access semantic representation of 

4. The remote control system of claim 1 wherein said an electronic program guide and using said extracted 
natural language parser is disposed within said handheld semantic content to generate control instructions for 
case - said remote control, such that the natural language 

5. The remote control system of claim 1 further compris- parser is a task-based parser employing a grammar 
mg : 15 comprising a plurality of frames having slots represent - 

a processor component remote from said handheld case ing semantic structure of said electronic program guide, 

and wherein said natural language parser is disposed in wherein the natural language parser further comprises 

said processor component. a local parser adapted to identify predetermined task- 

6. The remote control system of claim 1 further compris- related fragments in speech input received from said 
m g. 20 microphone, and a global parser adapted to receive 

. 4 ... t . , t task-related fragments and to extract overall semantics 

an electronic program guide acquisition system coupled to from mc task .^ lated fragme nts. 

said memory for downloading said representation of an 16 The remoU devke of claim 15 wherein ^ 

electronic program guide via a telecommunications natural ^g^ge parser having an associated data store 

un k- 25 containing a representation of said electronic program guide, 

7. The remote control system of claim 6 wherein said and wherein said natural language parser selectively extracts 
telecommunications link is the internet. information from said program guide based on speech 

8. The remote control system of claim 6 wherein said information input received through said microphone, 
telecommunicauons link is an audio/video program content yj jh e remote control device of claim 15 wherein said 
delivery system. . . 30 speech recognizer system includes a data store containing a 

9. The remote control system of claim 1 wherein said representation of said electronic program guide and a system 
audio/video component includes a tuner and wherein said f or selectively updating the contents of said data store, 
remote control system communicates with said tuner to 18. The remote control device of claim 17 wherein said 
acquire said representation of an electronic program guide. system for selectively updating the contents of said data 

10. The remote control system of claim 1 further com- 35 st0 re includes a tuner for accessing a source of electronic 
prising: program guide information. 

a dialog manager in communication with said parser for 19. The remote control device of claim 17 wherein said 

generating prompts to the user based on said extracted system for selectively updating the contents of said data 

semantic content. store includes an internet access system for accessing a 

11. The remote control system of claim 1 further com- 40 source of electronic program guide information, 
prising: 20. The remote control device of claim 15 wherein said 

a dialog manager having speech synthesizer for generat- speech recognizer has a first component disposed in said 

ing speech prompts to the user based on said extracted handheld case and a second component disposed outside 

semantic content. said handheld case. 

12. The remote control system of claim 1 further com- 45 21. The remote control device of claim 20 wherein said 
prising: first component generates an audio domain signal for trans- 

a digitizing tablet disposed in said handheld case for mission to said second component, 

pen-based input of user-supplied information. 22. The remote control device of claim 20 wherein said 

13. The remote control system of claim 12 wherein said firsl component extracts speech parameters from input 
digitizing tablet displays prompts that are actuable by pen to so speech from a user and transmits said parameters to said 
limit the context in which said parser extracts semantic second component for recognition. 

content. 23. The remote control device of claim 1 wherein said 

14. The remote control system of claim 1 further com- recognizer has a first component disposed in said 
prising: handheld case and a second component disposed outside 

display unit disposed in said handheld case for providing 55 said handheld case. 

information to the user. 24 * ^ rcmole controi device of claun 23 wherem 531(1 

15. A remote control device comprising: first component generates an audio domain signal for trans- 
..... .... mission to said second component. 

a handheld case having a communication interface 2$ ^ remotc of ^ ^ ^ 

through which control instructions are issued to a ^ fifSt component extracls speech parameters from ^ 

remo e component, speech from a user and transmits said parameters to said 

a display screen disposed in said case; second component for recognition, 

a microphone disposed in said case; 26. A remote control system for controlling at least one 

a speech recognizer system coupled to said microphone; audio- video component comprising: 

a user profile data store for storing information selected 65 a handheld case; 

from the group consisting of prior use information, a microphone disposed in said case for receiving speech 

preference information and combinations thereof; input from a user; 
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communication system disposed in said case for trans- 
mitting data signals to a location remote from said 
handheld case; 

speech recognizer for converting said speech input to 
text output; 

memory for storing a semantic representation of an 
electronic program guide; 

natural language parser in communication with said 
speech recognizer and with said memory, said parser 
being operative to extract semantic content from said 
processed speech input and to access semantic repre- 
sentation of said electronic program guide using said 
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extracted semantic content to generate control instruc- 
tions for said audio/video component, such that the 
natural language parser is a task-based parser employ- 
ing a grammar comprising a plurality of frames having 
slots representing semantic structure of said electronic 
program guide, wherein the natural language parser 
further comprises a local parser adapted to identify 
predetermined task-related fragments in said speech 
input, and a global parser adapted to receive task- 
related fragments and to extract overall semantics from 
the task-related fragments. 



09/15/2003, EAST Version: 1.04.0000 



